TR-2003009: A Hierarchical Projection Pursuit Clustering Algorithm

نویسندگان

  • Jayson E. Rome
  • Alexei D. Miasnikov
  • Robert M. Haralick
چکیده

We define a cluster to be characterized by regions of high density separated by regions that are sparse. Given a collection of observations X = {xi}, xi ∈ Rd, |X| = N , we would like to find clusters in data sets in which d and possibly N are large, in which there is no known parametric distribution and in which clusters may take on arbitrary shapes. By observing the downward closure property of density, the search for interesting structure in a high dimensional space can be reduced to a search for structure in lower dimensional subspaces. We present a parameter free Hierarchical Projection Pursuit Clustering (HPPC) algorithm that repeatedly bi-partitions interesting lower dimensional projections of a high dimensional dataset. We describe a projection search procedure for use with relatively high dimensional data and a projection pursuit index function based on the Kittler and Illingworth optimal threshold technique. The output of the algorithm is a decision tree whose nodes store a projection and threshold and whose leaves represent the clusters (classes). We present several methods for cluster validation that are used to evaluate the algorithm. Experiments with various real and synthetic datasets show the effectiveness of the approach. Portions of this work were funded by a New York State Office of Science, Technology and Academic Research grant and by Syllogy Software in cooperation with the Institute for Software Design and Development of the City University of New York

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Projection Pursuit Clustering Model and its Application Based on Quantum-behaved Particle Swarm Optimization

Extracting the information with biological significance in amounts of gene expression data is an important research direction. Clustering algorithm in this area has been increasingly widely applied. According to the characteristic of gene expression data, the improved projection pursuit cluster model was introduced in this area and Quantum-behaved Particle Swarm Optimization(QPSO) was put forwa...

متن کامل

انجام یک مرحله پیش پردازش قبل از مرحله استخراج ویژگی در طبقه بندی داده های تصاویر ابر طیفی

Hyperspectral data potentially contain more information than multispectral data because of their higher spectral resolution. However, the stochastic data analysis approaches that have been successfully applied to multispectral data are not as effective for hyperspectral data as well. Various investigations indicate that the key problem that causes poor performance in the stochastic approaches t...

متن کامل

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

روش نوین خوشه‌بندی ترکیبی با استفاده از سیستم ایمنی مصنوعی و سلسله مراتبی

Artificial immune system (AIS) is one of the most meta-heuristic algorithms to solve complex problems. With a large number of data, creating a rapid decision and stable results are the most challenging tasks due to the rapid variation in real world. Clustering technique is a possible solution for overcoming these problems. The goal of clustering analysis is to group similar objects. AIS algor...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016